Discriminative Topic Mining for Social Spam Detection
نویسندگان
چکیده
In the era of Social Web, there has been an explosive growth of user-contributed comments posted to various online social media. However, increasingly more misleading and deceptive user comments found at online social media have also been a great concern for consumers and merchants, and social spam have been brought to the attention by the legal circle in recent years. Social spam can cause tremendous loss to both consumers and merchants, and so there is a pressing need to design effective methodologies to detect social spam to maintain the hygiene of online social media. The main contribution of this paper is the illustration of a novel social spam detection methodology which combines word-, topic-, and user-based features to combat social spam. In particular, the proposed methodology is underpinned by the Labeled Latent Dirichlet Allocation (L-LDA) model, a kind of probabilistic generative model. A series of experiments conducted based on the social comments posted to YouTube show that our proposed methodology can achieve a detection accuracy of 91.17%. The business implication of our research is that merchants can apply our methodology to filter spam so as to extract accurate market intelligence from online social media. Moreover, social media site owners can leverage the proposed methodology to maintain the hygiene of their sites.
منابع مشابه
Making the Most of Tweet-Inherent Features for Social Spam Detection on Twitter
Social spam produces a great amount of noise on social media services such as Twitter, which reduces the signal-tonoise ratio that both end users and data mining applications observe. Existing techniques on social spam detection have focused primarily on the identification of spam accounts by using extensive historical and network-based data. In this paper we focus on the detection of spam twee...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملDetecting Pharmaceutical Spam in Microblog Messages
Microblogs are one of a growing group of social network tools. Twitter is, at present, one of the most popular forums for microblogging in online social networks, and the fastest growing. Fifty million messages flow through servers, computers, and cell phones on a wide variety of topics exchanged daily. With this considerable volume, Twitter is a natural and obvious target for spreading spam vi...
متن کاملSocialSpamGuard: A Data Mining-Based Spam Detection System for Social Media Networks
We have entered the era of social media networks represented by Facebook, Twitter, YouTube and Flickr. Internet users now spend more time on social networks than search engines. Business entities or public figures set up social networking pages to enhance direct interactions with online users. Social media systems heavily depend on users for content contribution and sharing. Information is spre...
متن کاملLearning to Represent Review with Tensor Decomposition for Spam Detection
Review spam detection is a key task in opinion mining. To accomplish this type of detection, previous work has focused mainly on effectively representing fake and non-fake reviews with discriminative features, which are discovered or elaborately designed by experts or developers. This paper proposes a novel review spam detection method that learns the representation of reviews automatically ins...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014